Efficient Access to Non-Sequential Elements of a Search Tree
نویسنده
چکیده
This article describes how a search tree can be extended in order to allow efficient access to predefined subsets of the stored elements. This is achieved by marking some of the elements of the search tree with marker bits. We show that our approach does not affect the asymptotic logarithmic complexity for existing operations. At the same time, it is beneficial because the modified search tree can now efficiently support requests on predefined subsets of the search elements that it previously could not. Keywords-marker bits; search trees; data structures I. IN T RO D U C T I O N A balanced search trees, such as an AVL tree ([1]), an AA tree (see [2]), or a B tree ([3]), allows efficient retrieval of elements that are consecutive relative to an in-order traversal of the tree. However, there is no obvious way to efficiently retrieve the elements that belong to a predefined subset of the stored elements if they are not sequential in the search tree. For example, consider a database that stores information about company employees. A search tree may store information about the employees ordered by age. This search tree can be used to retrieve all the employees sorted by age, but the search tree does not efficiently support the request of retrieving all rich employees (e.g., making more than 100,000 per year) sorted by age. In this paper, we will show how the example search tree can be extended with marker bits so that both requests can be efficiently supported. The technique that is proposed in this paper will increase the set of requests that can be efficiently supported by a search tree. This means that fewer search trees will need to be built. This approach will not only save space, but will also improve update performance. Naı̈ve solutions to the problem fail. For example, it is not enough to mark all the nodes of the search tree that contain data elements that belong to subsets of the data that we are interested in. This approach will not allow us to prune out any subtrees because it can be the case that the parent node does not belong to an interesting subset, but the child nodes do. To the best of our knowledge, detailed explanation of how marker bits work have not been previously published. Our previous work [5] briefly introduces the concept of marker bits, but it does explain how marker bits can be main tained after insertion, deletion and update. Other existing approaches handle requests on different subsets of the search tree elements by exhaustive search or by creating additional search trees. However, the second approach leads to not only unnecessary duplication of data, but also slower updates to multiple copies of the same data. Given a subset of the search elements S, our approach marks every node in the tree that contains an element of S or that has a descendant that contains an element of S. These additional marker bits will only slightly increase the size of the search tree (with one bit per tree node), but will allow efficient logarithmic execution of requests that ask for the elements of S in the tree order. In what follows, Section II presents core definitions, Section III describes how to perform different operations on a search tree with marker bits, and Section IV contains the conclusion. II. DE FI N I T I O N S Definition 1 (MB-tree): An MB-tree has the following s syntax: ((S1, . . . , Ss), S, O), where S and {Si} are sets i=1 over the same domain Δ, Si ⊆ S for i ∈ [1..s], and O is a total order over Δ. This represents a balanced search tree of the elements of S (every node of the tree stores a single element of S), where the in-order traversal of the tree produces the elements according to the order O. In addition, every node of the tree contains s marker bits and the ith marker bit is set exactly when the node or one of its descendants stores an element that belongs to Si we will refer to this property as the marker bit property. The above definition can be trivially extended to allow an MB-tree to have multiple data values in a node, as is the case for a B Tree, but this is beyond the scope of this paper. Going back to our motivating example, consider the MB-tree ((RICH EMPS ), EMPS , (age)). This represents a search tree of the employees, where the ordering is relative to the attribute age in ascending order. The RICH EMPS set consists of the employees that make more than $100,000 per year. Figure 1 shows an example instance of this MB-tree. Each node of the tree contains the name of the employee followed by their age and salary. Each node in the MB-tree contains the name of the employee, their age, and their salary. Above each node the value of the marker bit is denoted, where the bit is set 0 Peter, 22, $20,000 Dave, 30, $20,000
منابع مشابه
Identifying and Ranking the Important Textual and Paratextual Elements in Fiction Retrieval
Purpose: The purpose of this study is to identify the textual and paratextual elements in retrieving fiction from the readers’ perspective in order to provide the most appropriate access points for the readers and to improve access to fictions based on the readers’ needs. Method: The current research is an applied study in terms of purpose, applying a mixed method that was conducted using the ...
متن کاملConcurrent Search Tree by Lazy Splaying
In many search tree (maps) applications the distribution of items accesses is non-uniform, with some popular items accessed more frequently than others. Traditional self-adjusting tree algorithms adapt to the access pattern, but are not suitable for a concurrent setting since they constantly move items to the tree’s root, turning the root into a sequential hot spot. Here we present lazy splayin...
متن کاملEffective protocols for kNN search on broadcast multi-dimensional index trees
In a wireless mobile environment, data broadcasting provides an efficient way to disseminate data. Via data broadcasting, a server can provide location-based services to a large client population in a wireless environment. Among different location-based services, the k nearest neighbors (kNN) search is important and is used to find the k closest objects to a given point. However, the kNN search...
متن کاملBinary Coded Web Access Pattern Tree in Education Domain
Web Access Pattern (WAP), which is the sequence of accesses pursued by users frequently, is a kind of interesting and useful knowledge in practice. Sequential Pattern mining is the process of applying data mining techniques to a sequential database for the purposes of discovering the correlation relationships that exist among an ordered list of events. WAP tree mining is a sequential pattern mi...
متن کاملEfficient sequential access pattern mining for web recommendations
Sequential access pattern mining discovers interesting and frequent user access patterns from web logs. Most of the previous studies have adopted Apriori-like sequential pattern mining techniques, which faced the problem on requiring expensive multiple scans of databases. More recent algorithms that are based on the Web Access Pattern tree (or WAP-tree) can achieve an order of magnitude faster ...
متن کامل